Minimizer of the Reconstruction Error for multi-class document categorization
نویسندگان
چکیده
In the present article we introduce and validate an approach for single-label multi-class document categorization based on text content features. The introduced approach uses the statistical property of Principal Component Analysis, which minimizes the reconstruction error of the training documents used to compute a low-rank category transformation matrix. Such matrix transforms the original set of training documents from a given category to a new low-rank space and then optimally reconstructs them to the original space with a minimum reconstruction error. The proposed method, called Minimizer of the Reconstruction Error (mRE) classifier, uses this property, and extends and applies it to new unseen test documents. Several experiments on four multi-class datasets for text categorization are conducted in order to test the stable and generally better performance of the proposed approach in comparison with other popular classification methods.
منابع مشابه
Multi-class Text Categorization with Error Correcting Codes
Automatic text categorization has become a vital topic in many applications. Imagine for example the automatic classi cation of Internet pages for a search engine database. The traditional 1-of-n output coding for classi cation scheme needs resources increasing linearly with the number of classes. A di erent solution uses an error correcting code, increasing in length with O(log2(n)) only. In t...
متن کاملMulti-class Classification with Error Correcting Codes
Automatic text categorization has become a vital topic in many applications. Imagine for example the automatic classification of Internet pages for a search engine database. The traditional 1-of-n output coding for classification scheme needs resources increasing linearly with the number of classes. A different solution uses an error correcting code, increasing in length with O(log2(n)) only. I...
متن کاملExploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملError-Correcting Output Codes for Multi-Label Text Categorization
When a sample belongs to more than one label from a set of available classes, the classification problem (known as multi-label classification) turns to be more complicated. Text data, widely available nowadays in the world wide web, is an obvious instance example of such a task. This paper presents a new method for multi-label text categorization created by modifying the Error-Correcting Output...
متن کاملImproving Multiclass Text Classification with Error-Correcting Output Coding and Sub-class Partitions
Error-Correcting Output Coding (ECOC) is a general framework for multiclass text classification with a set of binary classifiers. It can not only help a binary classifier solve multi-class classification problems, but also boost the performance of a multi-class classifier. When building each individual binary classifier in ECOC, multiple classes are randomly grouped into two disjoint groups: po...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Expert Syst. Appl.
دوره 41 شماره
صفحات -
تاریخ انتشار 2014